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Abstract. An overview of the evolution of computing-oriented publications in high energy 
physics following the start of operation of LHC. Quantitative analyses are illustrated, which 
document the production of scholarly papers on computing-related topics by high energy physics 
experiments and core tools projects, and the citations they receive. Several scientometric 
indicators are analyzed to characterize the role of computing in high energy physics literature. 
Distinctive features of software-oriented and hardware-oriented scholarly publications are 
highlighted. Current patterns and trends are compared to the situation in previous generations' 
experiments. 



1. Introduction 

Publications in scholarly journals establish the body of knowledge deriving from scientific 
research; they also play a fundamental role in the career path of scientists and in the evaluation 
criteria of funding agencies. 

A previous scientometric study [lj highlighted that software-oriented publications are 
underrepresented with respect to hardware-oriented ones in the field of high energy physics 
(HEP). The results of that analysis showed that the relative difference between the production 
of scholarly literature in these areas had increased in the context of the experiments at LHC 
(Large Hadron Collider) with respect to the previous generation's experiments at LEP (Large 
Electron-Positron collider). The analysis in [lj was performed prior to the start of operation of 
LHC. 

The scientometric analysis summarized in this paper, which reflects a presentation on this 
topic at the CHEP (Computing in High Energy Physics) 2012 conference, reviews the publication 
patterns in HEP computing in greater detail, with special emphasis on their evolution since the 
beginning of LHC operation. 

2. Scope of the study 

The study summarized in this paper provides a quantitative overview of publication patterns in 
high energy physics over the past thirty years, with emphasis on software-oriented publications. 

The scientometric analysis is focused on a set of topics that are representative software R&D 
(research and development) in the context of HEP. The selection is far from exhaustive of the 
wide variety of research activities in experimental high energy physics, rather it intends to 
highlight some distinctive features of the literary production in the field. 



The analysis concerns a representative sample of general software tools, which respond to 
common needs of the HEP experimental community, and a sample of HEP experiments of the 
current and past generation. Two widely used general software tools, Geant4 j2j [3] and ROOT 
|H [5] are the object of a detailed scientometric analysis. More limited investigations concern 
the publications associated with other software tools contributing to the general computing 
infrastructure of LHC experiments, such as the LHC Computing Grid. The four major 
experiments at LEP, the ALICE, ATLAS, CMS, LHCb and TOTEM experiment at LHC, and 
the BaBar experiment at the SLAC B-factory are included in the scientometric analyis. 

The sample subject to evaluation consists of regular publications in established peer reviewed 
journals. Contributions to conference proceedings, books, institutional reports, items in preprint 
archives, white papers posted on web sites and software manuals are not considered. Some 
journals (e.g. Nuclear Instruments and Methods, NIM) also publish conference proceedings, 
usually in dedicated issues: these articles have been identified and excluded from the analysis. 

The examined scientometric indicators include the number of publications produced by the 
various subjects under study, their time distribution, the journals where they are published and 
their citation patterns. 

3. Data sources and analysis methods 

The main source for the scientometric analysis reported in this paper is Thomson- Reuters' Web 
of Science [6], which is considered the most authoritative reference for bibliometric information 
in the academic environment. The authors' institutional subscription gives access to a subset 
of it, the "Science Citation Index Expanded" database; it does not include the "Conference 
Proceedings Citation Index". The database covers the period from 1970 to date. 

The access to a subset of the Web of Science generates an apparent mismatch between the 
total number of citations associated with a paper, which includes entries from the "Conference 
Proceedings Citation Index", and the actual number of citations available for analysis, which 
is limited to publications in journals belonging to the "Science Citation Index Expanded". A 
further complication for scientometric analysis is due to the incorrect classification of some 
publications listed in the "Science Citation Index Expanded" as "Conference proceedings" : this 
label is arbitrarily attributed by Thomson- Reuters to some regular articles in journals that never 
publish conference proceedings (e.g. IEEE Transactions on Nuclear Science, TNS). Conversely, 
some entries in the "Science Citation Index Expanded" that are not labeled as "Conference 
proceedings", appear in journals (e.g. Nuclear Instruments and Methods) as contributions to 
conference proceedings. These errors in the Web of Science have been manually corrected in 
the analysis whenever possible: for instance, all papers published in IEEE Transactions on 
Nuclear Science are considered in the analysis as regular journal publications, irrespective of 
Thomson- Reuters' classification of some of them as conference proceedings, and papers published 
in Nuclear Instruments and Methods A issues dedicated to conference proceedings have been 
removed from the analysis, even if they are not identified as "Conference proceedings" in the 
Web of Science. 

Other sources have been used to cross-check and complement the information derived form 
the Web of Science: the web sites of the publishers of technological journals relevant to HEP 
and CERN Document System (CDS). The comparison of the data retrieved from these sources 
has highlighted some omissions and inconsistencies in the data sample retrieved from Thomson- 
Reuters' Web of Science: for instance, some papers published by LHC experiments, which are 
listed in the CDS database, do not appear in the Web of Science, and the number of citations 
of a paper reported in the publisher's web site is in some cases inconsistent with that reported 
by the Web of Science. 

The publications by HEP experiments are distinguished into physics papers (i.e. publications 
of experimental results representing the object of the experiment) and technological papers (i.e. 



publications about the instruments and methods that contribute to produce the experimental 
results). Technological publications are further identified as hardware- oriented, software oriented 
or dealing with data acquisition (DAQ) and trigger. This classification implies some degree of 
subjectivity, which has been mitigated by performing cross-checks over the selections performed 
by individual analysts. It is worthwhile to note that the classification of publications is part of 
the regular professional practice of the authors of this paper, either as members of the Editorial 
Board of a core journal in nuclear technology or as responsible of the library of a major HEP 
laboratory. 

The attribution of a paper to a given category is based on a variety of criteria. In some cases 
the title of a paper or the journal where it is published unambiguously identify its topic: for 
instance, papers published in Physical Review D are all classified as physics papers. For most 
papers, the record in the Web of Science, which also includes the abstract, provides sufficient 
information to identify the scope of the paper and to classify it in one of the above mentioned 
categories. In cases where the attribution is not evident based on the information in the Web 
of Science, the full text of the paper was evaluated. 

Complementary analyses based on the Web of Science and on publishers' web sites data, 
performed independently by different analysts, confirmed the robustness of the classification. 
Based on detailed cross checks over selected samples, the uncertainty in the results reported in 
this paper, which derives from intrinsic inconsistencies in the Web of Science and from subjective 
classification of papers, can be estimated of the order of a few percent. This level of uncertainty 
does not affect critically the conclusions of this study. 

The analysis reported in the following sections covers three decades of scientific literature 
(1982-2011); it is limited to papers published until the end of 2011 to ensure the reproducibility 
of results based on the Web of Science. Unless differently stated, the number of citations reflects 
the status in the Web of Science as on 14 May 2012, i.e. one week prior to the beginning of the 
CHEP 2012 conference. 

4. General software tools 

Two software tools used by LHC experiments for simulation and data analysis have been 
evaluated: Geant4 and ROOT. 

Geant4 is documented in two reference publications 013], which are brought to the attention 
of the experimental community in the Geant4 web page. These papers have collected respectively 
2934 and 574 citations (including citations from conference proceedings indexed by the Web 
of Science); reference [2] has crossed the threshold of 3000 citations shortly after the CHEP 
conference (3037 citations by 18 June 2012). Reference [2] is the most cited publication in the 
"Nuclear Science and Technology" category over the period considered in this scientometric 
study. Excluding the Review of Particle Properties, it is the most cited paper produced by 
CERN and by INFN. 

The time distribution shown in figure [I] shows that citation to the more recent reference [3] , 
published in 2006, is omitted by most publications that cite the earlier one [2], published in 
2003. 

Although the development of Geant4 was originally motivated by the requirements of 
LHC experiments, the source of the citations to its reference paper [2] shows the widely 
multidisciplinary character of its use in the scientific community. Figure [2] lists the journals 
contributing the largest number of citations to [2J: it includes physics journals with various 
scope (high energy physics, nuclear physics, astroparticle physics), nuclear technology journals, 
medical physics and radiation protection journals, and a regional journal (published by a national 
physics society). 

One can observe in flgure[3]that only a relatively small number of citations to [2] are associated 
with LHC collaborations at this stage of their life-cycle. It is worthwhile to note that only 
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Figure 1. Number of citations collected by Geant4 reference papers [21 E] as a function of time. 



NIM A 
Phys. Rev. D 
TNS 

Phys. Rev. Lett. 

Med. Phys. 
Phys. Med. Biol. 
Phys. Rev. C 
Phys. Lett. B 
NIM B 
J INST 
EPJC 
Astrop. Phys. 
JHEP 
J. Phys. G 
Appl. Radiat. Isot. 
Radiat. Meas. 
J. Korean Phys. Soc. 
Radiat. Prot. Dosim. 




100 



200 



300 



400 



Citations 



Figure 2. Journals citing Geant4 reference [2]; the colour codes in the plots are associated with 
the scope of the journals (physics: violet, nuclear technology: green, medical physics: orange, 
radiation protection: light brown, regional: blue. The journals listed in the histogram contribute 
approximately 75% of the total number of citations to [2]. 
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Figure 3. Experimental collaborations citing Geant4 reference [2J: LHC experiments (red), 
other HEP experiments (blue), non-HEP experiments (yellow). The collaboration listed in the 
histogram contribute approximately 16% of the total number of citations to [2]. 



approximately 20% of the citations to [2] listed in the Web of Science are formally associated 
with a collaboration (identified as "Group Authors"); the vast majority of publications citing 
[2J appear as the product of individual research groups, rather than of formal experimental 
organizations. Figure [3] lists the collaborations that contribute the largest number of citations; 
they correspond to approximately 75% of the citations associated with collaborations in the 
Web of Science. 

ROOT is documented in two reference publications [HE], published in 1997 and 2009. The 
earlier one is a contribution to a workshop proceedings, while the later one is a regular journal 
publication. These papers have collected respectively 540 and 27 citations (including citations 
from conference proceedings indexed by the Web of Science). 

The time distribution shown in figure [4] shows a similar pattern as in figure [Tj the citation 
to the more recent reference is omitted by most publications that cite the earlier one. 

The citations to ROOT earlier reference [1] have a multidisciplinary character, as is visible in 
figure [5j although the relative contribution from various domains appears different for Geant4 
and ROOT. The distribution of the domains of the citations listed in figures [2] and [5] is 
summarized in table [4j citations to Geant4 appear equally distributed between physics and 
nuclear technology journals, while nuclear technology journals are the most relevant source of 
citations to ROOT; also, the fraction of citations from medical physics and radiation protections 
journals is significantly larger for Geant4 than for ROOT. It is worhtwhile to remind the reader 
that table [4j similarly to figures [2] and [5j reflects the major sources of citations, amounting to 
approximately 75% of total citations collected by Geant4 and ROOT main references. 



5. Publications by HEP experiments 

The number of publications produced by the HEP experiments considered in this study is 
plotted in figure |6j The plot distinguishes papers belonging to various categories: physics, 
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Figure 4. Number of citations collected by ROOT reference papers jUE] as a function of time. 
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Figure 5. Journals citing ROOT reference [I]; the colour codes in the plots are associated 
with the scope of the journals (physics: violet, nuclear technology: green, medical physics: 
orange, radiation protection: light brown, computing: grey. The journals listed in the histogram 
contribute approximately 75% of the total number of citations to [I]. 



Table 1. Source of citations to Geant4 and ROOT main reference papers. 
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Figure 6. Papers published by the HEP experiments considered in this study, distinguishing 
the contribution of various categories to the total count. 



hardware, software, DAQ-trigger and general. Physics papers are the dominant component for 
the experiment that terminated the data-taking phase and are close to the end of their lifecycle, 
while they represent a small fraction of the publications by LHC experiments, which are in the 
early stage of their run. The last category includes papers describing the whole detector, or the 
performance of some subsystems, which involve hardware and software aspects. 

Software related papers appear to be a small fraction of publications for all the experiments: 
this trend is evident in figure [7J which shows the apportioning of technological papers across 
the three categories of hardware, software and DAQ-trigger. The relatively smaller presence 
of software publications in the production of LHC experiments is confirmed in a more detailed 
analysis performed over the papers published since the start of LHC operation in 2008 in two 
representative nuclear technology journals, NIM A and TNS, shown in figure |8| 

The time distribution of the publications produced by the experiments considered in this 
study is shown in figure . The horizontal scale of the plots takes as a reference the year when 
LEP (1989), BaBar (1998) and LHC (2008) started running. Figure shows both the total count 
of papers produced per year, and the number of published papers per collaboration member 
along the lifecycle of the experiments. The number of collaboration members is subject to 
variation over the lifetime of an experiment, a constant number is assumed in this study, due to 
the difficulty of ascertaining the number of collaboration members as a function of time for all 
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Figure 7. Technological papers published by the HEP experiments considered in this study: 
fraction of hardware, software and DAQ-trigger publications. 
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Figure 8. Technological papers published by the HEP experiments considered in this study 
in TNS (left) and NIM A (right) since 2008: hardware, software and DAQ-trigger publications. 
The bins identified as LHC correspond to papers related to LHC, but not specifically associated 
with any of the LHC experiments. 



the experiments. The size of the LHC collaborations is assumed to be the number of members 
reported in CERN "Greybook" at the time of the CHEP conference; for LEP experiments and 
BaBar the size of the collaboration was taken as the number of authors of their most cited paper. 



The number of collaboration members assumed in this study is shown in figure 10 



The number of hardware, software and DAQ-trigger publications appears approximately 
constant of the three generations of HEP experiments considered in this study, when it is scaled 
to the collaboration size, as shown in figure [TT] The ratio of hardware to software publications, 
shown in figure 12, is also approximately constant across the experiments: harware papers 
outnumber software ones by approximately a factor four. This result differs from that reported 
in [I], which depicted an earlier stage of the lifecycle of LHC experiments, preceding the start 
of LHC operation. The difference could be also partly explained by evolutions in the Web of 




Year 



Figure 9. Papers published by the HEP experiments considered in this study as a function 
of operation year: total count (left) and number of published papers per collaboration member 
(right). Years are counted with reference to the start of run of LEP (1989), BaBar (1999) and 
LHC (2008). 
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Figure 10. Size of the experimental collaborations considered in this study. 



Science since the publication of [lj, namely the move of a large number of conference papers to 
a dedicated database, which excludes them from the analysis reported here. 

Figure [13| illustrates the distribution of papers published by HEP experiments in physics and 
technological journals. The histogram involves the journals collecting the largest number of 
publications by the experiments considered in this study. One can observe in figure 14 that the 
relative importance of some journals has evolved over the years in the field: among technological 
journals, TNS has increased its popularity in the HEP domain in the last decade, while JINST 
(Journal of Instrumentation) is growing rapidly. 
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Figure 11. Number of hardware, software and DAQ-trigger papers published by HEP 
experiments, scaled by the number of collaboration members. 
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Figure 12. Ratio of hardware to software papers for a sample of HEP experiments. 



The distribution of the number of citations collected by various categories of HEP 
experimental papers is shown in figure [15} physics papers receive a larger number of citations 
than technological papers. The fraction of physics papers that are not cited amounts to 4%, 
while it is much larger for technological papers: 17% for hardware, 25% for software and 27% 
for DAQ-trigger publications within the data sample examined in this study. Physics papers 



include a larger number of references than technological papers, as it appears in figure 16 the 



different citation habits in these two domains are prone to affect the citation patterns shown in 

ESI 

The citations to the physics papers of the HEP experiments considered in this study come 
almost entirely from journals specialized in high energy physics or closely related fields, such 
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Figure 13. Journals where the HEP experiments considered in this study published their 
papers. 



-Q 

E 



n 



□ EPJC 

□ JHEP 

□ Nucl. Phys. B 

□ Phys. Lett. B 

□ Phys. Rev. D 

■ Phys. Rev. Lett. 

□ Z. Phys. C 

□ New J. Phys 

□ EPL 

□ Astrop. Phys. 

■ Phys. Rep. 

■ CPC 

■ JINST 

□ NIM A 

□ NIMB 

□ IEEETNS 



/^v-y 4 >y ***** 

Journal 



.D 



-Q 
Q. 
O 

<r> 

-Q 

E 



P-U 



□ 


EPJC 


□ 


JHEP 


□ 


Nucl. Phys. B 


□ 


Phys. Lett. B 


□ 


Phys. Rev. D 


□ 


Phys. Rev. Lett. 


□ 


Z. Phys. C 


□ 


New J. Phys 


□ 


EPL 


□ 


Astrop. Phys. 


□ 


Phys. Rep. 


□ 


CPC 


□ 


JINST 


□ 


NIM A 


□ 


NIM B 


□ 


IEEETNS 



A- .<? V oS' „0- 0° X 



Journal 



(a) 1982-1999 



(b) 2000-2011 



Figure 14. Journals where HEP experiments published their papers: in years 1982-1999 (left) 
and 2000-2011 (right) 
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Figure 15. Number of citations to HEP experiments' publications: physics, hardware, software 
and trigger/DAQ publications. 



as nuclear physics and astroparticle physics: the journals contributing more than 90% of the 
citations to physics papers published by representative LEP experiments (ALEPH and DELPHI) 



and LHC experiments (ATLAS and CMS) are listed in figure 17 Technological papers published 
by HEP experiments are cited by high energy and nuclear physics journals, by nuclear technology 
journals and by review journals, as is illustrated in 18, Differently from what observed for the 
general software tools examined in section [1J the papers published by HEP experiments do not 
appear to collect a significant number of citations from other disciplines, such as medical physics 
and radiation protection. 

A large fraction of the citations collected by LHC technological publications consists of self- 
citations (i.e. the citing papers include at least one of the authors of the cited work): this 



pattern is illustrated in figures [19] and [20] for the two journals collecting the largest number of 
technological publications by LHC experiments, NIM A and TNS. 

The most cited publications produced by HEP experiments are in most cases the respective 
reference papers describing the whole detector: these papers are usually cited by the papers 
reporting the physics results of the experiment. In the data sample examined in this study, 
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Figure 16. Number of references cited in HEP experiments' publications: physics, hardware, 
software and trigger/DAQ publications. 



excluding LHC experiments, the number of citations collected by the most cited paper varies 
from 309 for the DELPHI experiment to 859 for the BaBar experiment; the citation statistics is 
not yet meaningful for LHC experiments, that are at an early stage of their physics production. 

6. Publications related to the LHC Computing Grid 

Grid computing is an essential component of the operation of LHC experiments: a large effort 
has been invested in the past decade to develop the grid computing infrastructure and several 
application tools used by LHC experiments within a project known as "LHC Computing Grid" 
(LCG). 

A search for publications associated with LCG in the Web of Science results in a small 
sample, consisting of less than 20 papers. Grid computing has represented a large fraction of 
the scientific program of the CHEP conference for the past decade, in addition to dedicated 
conferences. The small sample of journal publications related to LCG retrieved in the Web of 
Science suggests that only a limited fraction of conference presentations in this field evolves into 
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Figure 17. Sources of citations to physics papers published by representative LEP experiments 
(left) and LHC experiments (right); the histograms include more than 90% of the citations 
received by the physics papers of the selected experiments. 
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Figure 18. Sources of citations to technological papers published by representative LEP 
experiments (left) and LHC experiments (right): the histograms include more than 90% of 
the citations received by the technological papers of the selected experiments. 




Figure 19. Number of self-citations (left) and outside citations (right) to papers published by 
LHC experiments in NIM A since 2008: a citation is considered a self-citation when the citing 
paper includes at least one of the authors of the cited work, otherwise it is considered an outside 
citation. The bins identified as LHC correspond to papers related to LHC, but not specifically 
associated with any of the LHC experiments. 
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Figure 20. Number of self-citations (left) and outside citations (right) to papers published by 
LHC experiments in TNS since 2008: a citation is considered a self-citation when the citing 
paper includes at least one of the authors of the cited work, otherwise it is considered an outside 
citation. The bins identified as LHC correspond to papers related to LHC, but not specifically 
associated with any of the LHC experiments. 



regular publications in scholarly journals. 

Due to the small sample size, a statistical analysis of LCG publications does not appear 
meaningful. 

Conclusions 

The scientometric analysis reported in this paper provides a quantitative overview of publication 
patterns in HEP experiments, covering the last three decades. 

The analysis has confirmed the general trend observed in a previous study: software related 
papers are largely underrepresented with respect to hardware papers in the high energy physics 
experimental environment. The ratio of hardware to software papers is approximately constant 
over the experiments of the LEP and LHC generations. 



Software papers collect in average fewer citations than hardware papers (and physics papers); 
they also cite fewer references in their bibliography. 

The analysis of citations to papers published by HEP experiments shows that both physics 
and technological papers collect the largest number of citations within the HEP environment; 
a small fraction of citations comes from closely related fields, such as nuclear and astroparticle 
physics. 

General software tools motivated by the requirements of HEP experiments, such as Geant4 
and ROOT, exhibit different patterns. The earlier Geant4 reference [2] has received more than 
3000 citations at the time of writing this paper: it is a landmark paper in Thomson-Reuters 
Nuclear Science and Technology category, and the most cited publication for major institutions 
such as CERN and INFN. The analysis of the citations collected by these software tools shows the 
multidisciplinary character of these tools, which appear to be used in a variety of experimental 
fields not limited to HEP. Geant4 is cited by a large number of physics papers, which confirm 
its significant role in the production of physics results by HEP experiments in the LHC era. 
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